Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: nicosi Date: Oct 24, 2009 4:23pm
Forum: forums Subject: Downloading snapshots of the Internet Archive

Hello all,

how can I download snapshots of the Internet Achive?

Are there tar balls, zip files or something similar?
Or even better, torrent files with snapshots that the Internet Archive users could share and seed?

I'm asking this particularly because of today's news about the Internet Archive making available all of its scanned and digitized e-book collection on the OLPC/XO laptops, but I could find no instruction on how to download such collection myself into my own laptop (which is not an OLPC/XO one btw).

I know that all the content here in the Internet Archive is in the public domain, so the literally missing link here is where are all the download links for the Internet Archive collections, particularly torrent ones which would spread and even replicate and publish the public domain content here around as many places as possible.


Reply to this post
Reply [edit]

Poster: jonc Date: Oct 24, 2009 5:41pm
Forum: forums Subject: Re: Downloading snapshots of the Internet Archive

I doubt you really want to download the entire collection. There are over 1.5 million titles, with a scanned title typically being over 5MB - That's 7.5 terrabytes, do you really have room for that? Additionally, the archive gets its share of spam. A "snapshot" would include a lot of ads and unsolicited junk that hasn't yet been filtered out yet.

I haven't read the article you refer to, but I'm sure they are not loading the entire library on the OLPCs. They have probably arranged for them to access and display the books. What kind of laptop do you have? Are there problems downloading and viewing the titles on that?

This post was modified by jonc on 2009-10-25 00:41:00

Reply to this post
Reply [edit]

Poster: nicosi Date: Oct 25, 2009 12:07pm
Forum: forums Subject: Re: Downloading snapshots of the Internet Archive

Hi jonc,

actually, 7.5 TBs are not that much. In my desktop system I have 4 TB, which I bought for about GBP 220.00, and I would happily spend the same amount to have 8 TB and be able to seed a torrent of the Internet Archive public domain contents. And I guess that in a couple more years, 8 TB will be common enough. The point here is to be able to easily clone the Internet Archive as a whole instead of title by title individually. I'd rather have my library with me than having to rely on a Internet connection all the time. I'd also rather build my own index to look up the pages, and I'd need them locally for that.

It would be possible to clone the Internet Archive on a specific day, and then work on the clone for some weeks removing the spam, and only then releasing the spam-free clone as a public snapshot for downloading / torrenting. Snapshots could be released once or twice a year.

Here is a link to one of the news articles commenting on the plans to load (part?) of the Internet Archive e-books on the OLPCs:

Reply to this post
Reply [edit]

Poster: jonc Date: Oct 26, 2009 7:08pm
Forum: forums Subject: Re: Downloading snapshots of the Internet Archive

They might do this if enough people asked for it. Another way might be to index the text collections.If you could find a way to make this advantageous to a local school or library (maybe they have a slow Internet connection?), IA might be very interested.

This post was modified by jonc on 2009-10-27 02:08:41